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Abstract 



A new distance function dist {A, B) for fuzzy sets A and B is intro- 
duced. It is based on the descriptive complexity, i.e., the number of bits 
(on average) that are needed to describe an element in the symmetric 
difference of the two sets. The distance gives the amount of additional 
information needed to describe any one of the two sets given the other. 
We prove its mathematical properties and perform pattern clustering on 
data based on this distance. 
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1 Introduction 

The notion of distance between two objects is very general. Distance metrics 
and distances have now become an essential tool in many areas of mathematics 
and its applications including geometry, probability, statistics, coding/graph 
theory, data analysis, pattern recognition. For a comprehensive source on this 
subject see [4]. The notion of a fuzzy set was introduced by [8]. It is a class of 
objects with continuous values of membership and hence extends the classical 
definition of a set (to distinguish it from a fuzzy set we refer to it as a crisp 
set). Formally, a fuzzy set is a pair {E,m) where £^ is a set of objects and m 
is a membership function m : E [0, 1]. Fuzzy set theory can be used in a 
wide range of domains in which information is incomplete or imprecise, such as 
pattern recognition, decision theory. The concept of distance and similarity is 
important in the area of fuzzy logic and sets. We now review some common 
ways of defining distances on fuzzy sets (see [9] and references therein) . 

Classical distances measure how far two points are in Euclidean space. For 
instance, the Minkowski distance between two points x and y in M" is defined 
as 
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r > 1. 



(1) 
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Let E he a finite set and let $(£') be the set of all fuzzy subsets of E. Consider 
A,B two fuzzy subsets A, B G <&(£') with membership functions mA,mB ■ E — > 
[0, 1]. Then (1) can be extended to the following distance, 



f r\ 

dr{A,B) := l'^\mA{x) - ■mB{x)\'' \ 
\xeE J 



l/r 

r > 1. 



Based on (1) letting r = 2 we have the Hausdorff distance between two non- 
empty compact crisp sets U, V cM., 



q ([/, V) := max < sup inf ^2 {u, v) , sup inf d2 {u, v) > . (2) 

This can be extended to fuzzy sets as follows: let A g ^{E) be a fuzzy set 
and denote by A^ the a-level set of the fuzzy set A which is defined as A^ — 
{x G E \ mA{x) > a}. Then for two fuzzy subsets A, B E $(£') the distance in 
(2) can be extended to the following distance between A and B, 



q{A,B) := f q{Aa,Bc,)da. 
Jo 



Another approach is based on set-theoretic distance functions. For a fuzzy 
set A G ^{E) define the cardinality of A as |^| = J2xge''^a(x). Extend the 
intersection and union operations by defining the membership functions 

mAnsix) ■■= min{m^(a;),TOB(x)} 

and 

rriAuBix) := max{mA{x),mB{x)} . 
Then for fuzzy sets A, B E <&(£') we may define the distance function 

\AUB\ ExeE^AuBix) 

Another distance is based on four features of a fuzzy set. Let the domain of 
interest be M and consider a fuzzy set A in The power of A (which extends 

the notion of cardinality) is defined as 



power (A) := / mA{x)dx. 

J —oo 

Let S{x) — ~x Xiix — [1 — x) ln(l — x) then define the entropy of A as 

/oo 
S{'mA{x))dx. 
-oo 

Define the centroid as 

C^xmA{x)dx 

.— — — 

power [A] 



and the skewness as 

f°° 3 
skew (^) / {x ~ c{A)) mA{x)dx. 

J —oo 

Let v{A) = [power (A) , entropy (A) , c(j4) , skew(A)] then [2] defines the dis- 
tance between two fuzzy sets A, B G 3'(K) as the Euclidean distance — w(-B)|| • 

We now proceed to discuss the notion of distances that are based on descrip- 
tive complexity of sets. 



2 Information based distances 

A good distance is one which picks out only the 'true' dissimilarities and ig- 
nores factors that arise from irrelevant variables or due to unimportant random 
fluctuations that enter the measurements. In most applications the design of a 
good distance requires inside information about the domain, for instance, in the 
field of information retrieval [1] the distance between two documents is weighted 
largely by words that appear less frequently since the words which appear more 
frequently are less informative. 

Typically, different domains require the design of different distance functions 
which take such specific prior knowledge into account. It can therefore be an 
expensive process to acquire expertise in order to formulate a good distance. 
Recently, a new distance for sets was introduced [7] which is based on the 
concept of descriptional complexity (or discrete entropy) . A description of an 
object in a finite set can be represented as a finite binary string which provides 
a unique index of the object in the set. The description complexity of the 
object is the minimal length of a string that describes the object. The distance 
of [7] is based on the idea that two sets should be considered similar if given 
the knowledge of one the additional complexity in describing an element of 
the other set is small (this is also referred to as the conditional combinatorial 
entropy, see [5, 6] and references therein). The advantage in this formulation of 
distance is its universality, i.e., it can be applied without any prior knowledge 
or assumption about the domain of interest, i.e., the elements that the sets 
contain. Such a distance can be viewed as an information-based distance since 
the conditional descriptional complexity is essentially the amount of information 
needed to describe an element in one set given that we know the other set (for 
more on the notion of combinatorial information and entropy see [5, 6]). 

In the current paper we introduce a distance function between two general 
sets, i.e., sets that can be crisp or fuzzy. Following the information-based ap- 
proach of [7] we resort to entropy as the main operator that gives the measure 
of dissimilarity between two sets. We use the membership of the symmetric 
difference of two sets as the probability of a Bernoulli random variable whose 
entropy is the expected description complexity of an element that belongs to 
only one of the two sets. Thus the distance function measures how many bits 
(on average) are needed to describe an element in the symmetric difference of 
the two sets. In other words, it is the amount of additional information needed 
to describe any one of the two sets given knowledge of the other. 

Being a description-complexity based distance gives it certain characteristic 
properties. For instance, the distance between a crisp set A and its complement 
A is zero since there is no need for additional information in order to describe 



one of these two sets when knowing the other. That is, knowledge of a set A 
automatically implies knowing the set A and vice versa. They are clearly not 
equal but our distance function cleverly renders them as the most similar that 
two sets can be (zero distance apart). 

The next section formally introduces the distance and in Theorem 2 we prove 
its metric properties. 



3 Distance function 

We write w.p. for "with probability". Let [N] = {1,...,A^} be a domain of 
interest. Let A e $([A^]) be a set with membership function niA ■ [N] [0, 1]. 
We use X to denote a value in [N]. Given two fuzzy subsets A, B ^ ^{[N]) with 
membership functions m^(a;), mB{x), as mentioned in section 1 we denote by 

mA\jB{x) := max {777,^(0;), mB(x)} 

and 

rriAnBix) := min {m^la;), tob(2:)} ■ 

Define by A A B = {A[JB)\{A^B) the symmetric difference between crisp 
sets A,B. For fuzzy sets A, B e '^{[N]) define by 

TnAhB{x) := mAyjB{x) ~ mAnB{x). 

Define a sequence of Bernoulli random variables Xa{x) for x E [N] taking the 
value 1 w.p. mA{x) and the value w.p. 1 — mA{x). Define by H{Xa{x)) the 
entropy of Xa{x), 

HiXA{x)) := ~mA{x)\ogmA{x) - (1 - m^(a;))log(l - mA(x)). 

Define the random variable 

1 w.p. ruAABix) 
w.p. 1-mAABix). 



XAAsix) := 

We define a new distance between A, B E $([iV]) as 



1 " 

dist(AS) := 



x=l 

Remark 1. This definition can easily be extended to the case of an infinite 
domain, for instance, a subset of the real line. In that case the distance can be 
defined as the expected value of E7?(Xaab(0) where ^ is a random variable 
with some probability distribution P(^) with respect to which the expectation 
is computed. 

The next theorem shows that the distance satisfies the metric properties. 

Theorem 2. The function dist{A,B) is a semi-metric on ^([N]), i.e., it is 
non-negative, symmetric, it equals zero if A = B and it satisfies the triangle 
inequality. 



Remark 3. Note that the function dist(A, B) may equal zero even when A ^ B. 
We now prove Theorem 2. 



Proof. Since the entropy function is non-negative (see for instance, [3]) then for 
any two subsets A, B ^ $([iV]) we have dist{A,B) > 0. It is easy to see that 
the symmetry property is satisfied since for every x G [N] we have XAAsix) — 
Xbaa{x). For every subset A the value dist(yl, A) — since mAAA{x) = hence 
H{XAAAix)) = for all x. 

Let us now show that the triangle inequality holds. Let A, B, C be any 
three elements of $([iV]). Fix any point x E [N] and without loss of generality 
suppose that mA{x) < mB{x) < mc{x). Denote by p = msix) — mA^x) 
and q — mc{x) — msix). Without loss of generality assume that p < q. 
Then we have niAAci^) = P ~^ Q- Denote by H{p), H(q) and H{p + q) the 
entropies H{Xaab{x)), H(Xbac{x)) and H{Xaac{x)) respectively. We aim 
to show that H{p + q) < H{p) + H{q). This will imply that for every x € [N], 
H{XAhc{x)) < H{Xaab{x)) + H{Xbac{x)) and hence it holds for the average 
W H{XAAcix)) < ^ H{Xaab{x)) + ^ H{XBAcix)). 

We start by considering the straight line function £ : [0, 1] [0, 1] defined 

as: 

,^,y,^Hi<l)-Hip)^^ Hiq)-Hip)^ 
q-p q-p 

which cuts through the points {z,£{z)) = {p,H{p)) and {z,£{z)) = {q,H{q)). 
For a function / let /' denote its derivative. We claim the following, 
Claim 1. H'{z) < £' {z) for all z G [q, 1]. 

Proof: The derivative of H{z) is H'{z) = log(i^). This is a decreasing 
function on [0, 1] hence it suffices to show that H'{q) < £'{z) for all z £ [q, 1]. 
The derivative of £{z) is ^^''^1^''^'' ■ So it suffices to show that 

H{q) - H{p) 
q-p 



log ( — ^ ) < 

This is equivalent to 



{q-p)\og[^-^\<H{q)-H{p). (3) 



The left hand side of (3) can be reduced to, 

glog(l - q) -q\ogq-p\og{l - q)+p\ogq. (4) 

Adding and subtracting the term {\ — q) log(l — q) and using H{q) ~ —q log q — 
(1 — q) log(l — q) makes (4) be expressed as 

H{q)+p\ogq+{l-p)\og{l-q). 

Substituting this for the left hand side of (3) and canceling H{q) on both sides 
gives the following inequality which we need to prove 

p\ogq+ (1 - p) log(l -q)< plogp +{l-p) log(l - p). 

It suffices to show that, 

plog[^] +(l-p)log('^] >0. (5) 



That (5) holds follows from the information inequality (see Theorem 2.6.3 of [3]) 
which lower bounds the divergence D{P\\Q) > where P,Q are two probability 
functions and D{P\\Q) = P{x) log ^jj^- Hence the claim is proved. ■ 

Next we claim the following: 

Claim 2. H{p + q) < i{p + q). 

Proof: Consider the case that p < q < |. Since 9 < | then H'{z) evaluated 
at z = q is non-negative. Hence both H{z) and £{z) are monotone increasing 
on q < z < ^ and H{q) — £{q). By Claim 1, £ increases faster than H on 
[q, 1], in particular on the interval q < z < ^. Hence, for all z S [q, |] we have 
H{z) < £{z). Now, if p + g e [q, \] then it follows that H{p + q) < £{p + q). 
Otherwise it must hold that p+q £ !]• But H is decreasing and £ is increasing 
over this interval. Hence we have £{z) > £{^) > > H{z) for z g (i, 1], in 

particular for z = p + q hence £{p + q) > H{p + q). This proves the claim. ■ 

From Claim 2 it follows that 

H{q)-H{p), , , H(q)-H(p) 

H{p + q) < \£i(^p + q)+H{p)- ^ 

q-p q-p 



It suffices to show that 



or equivalently, 



q-p 



,m^JM < Hiq) 
q-p 



Letting f[z) = ^'^^^ and differentiating we obtain 

which is non-positive for z G [0, 1]. Hence / is non-increasing over this interval. 
Since by assumption q > p then it follows that f{q) < f{p) and 7 holds. This 
completes the proof of the theorem. □ 

4 Simple Examples 

Let us evaluate this distance for a few examples. Consider two sets A and its 
complement A. Their membership functions satisfy the relation: 

TOj(x) = 1 — niAix) 

hence the membership function for the symmetric difference is 

'^ylAA(^) = max{m^(a;), (1 - TO^(a;))} - min{TO^(a;), (1 - m^(x))} . 

Note that for any x £ [N] with a crisp membership value, i.e., myi(x) = 1, or 
mA{x) = 0, we have to^^^(x) — 1 and hence in this case H{X^^-^{x)) — 0. 
This means that for a crisp set A (for all x G A, mA(x) G {0, 1}) our distance 
has the following property (we call this the complement-property): 



dist(A, A) = 0. 



From an information theoretic perspective, this property is expected since know- 
ing a set A automatically means that we also know how to describe its comple- 
ment. Hence there is no additional description necessary to describe A given A. 
This is what dist(A,v4) — means. 

Let us now consider some examples of pairs of fuzzy sets and their distances. 
Let TV = 20 and the domain be [TV] = {1, 2, . . . , 20}. In the following examples 
we plot the membership functions of several fuzzy sets. Note, we connect the 
point values of the membership function by lines in order to make the plots 
clearer (remember that the actual membership functions are defined only on 
the discrete set [N]). 

Example 4. Consider the fuzzy sets A,B,C and the complement A'^ with 
membership functions as shown in Figure 1. Note, that A and its complement 
are crisp sets. The distance matrix D = [dij] is shown in (8); the rows and 
columns correspond to A, B, C and A'^ so that for instance the element ^2,3 — 
dist{B,C) = 0.709. As can be seen, C is a translated version of B and they are 
both the same distance from A. This is due to H{Xaab{x)) = H{XAAcix+W))■ 
B and C are farther apart than B and A. Since dist{A,A'^) = then each one 
of B, C is of the same distance to A as to A'^. 



I 0.354 0.354 \ 

0.354 0.709 0.354 , , 

0.354 0.709 0.354 

\ 0.354 0.354 / 

Example 5. Continuing with the same domain as in Example 4 let us consider 
the fuzzy sets A,B,C and the complement A'^ with membership functions as 
shown in Figure 2. The membership function of the set C is now flat and as the 
distance matrix D — [dij] in (9) shows C is now farther from B (which has a 
triangular membership function). As in the previous example B remains closer 
to A than to C . 



I 0.354 0.5 \ 

0.354 0.854 0.354 , , 

0.5 0.854 0.5 ^' 

\ 0.354 0.5 / 

Example 6. Continuing with the same domain as in Example 4 let us consider 
the fuzzy sets A,B,C and the complement A'^ with membership functions as 
shown in Figure 3. Note that now C is translated from B by an amount that 
is smaller compared to Example 4- As can be seen from the distance matrix of 
(10) this results in a smaller distance dist(B,C) — 0.336 compared to 0.709. 
As in Example 4 , A is as similar to B as to C since the distance dist { A, B) = 
dist{A,C) = 0.354. 



D = 



I 
0.354 
0.354 

V 



0.354 


0.336 
0.354 



0.354 \ 

0.336 0.354 

0.354 

0.354 / 



(10) 



5 Clustering using the distance 



We tested the proposed distance function on real data. The data^ consists of 
answers from a survey given to the general population of 28 European countries. 
There are ten questions in the survey where a valid answer is a number in the 
set {1, . . . , 10}. The value 10 represents the most positive opinion and 1 the 
most pessimistic opinion (we denote the name of the attribute in parenthesis): 

• trust in local parliament (country_GOV) 

• trust in local politicians (politicians) 

• trust in EU Parhament (EU_GOV) 

• trust in United Nations (UN) 

• trust in country's parliament (country_GOV) 

• how satisfied with life (Life) 

• how satisfied with the national government (National_GQV) 

• immigration is bad or good (immigration) 

• the state of health services (Health) 

• how happy are you (happy) 

After normalizing each component we represent each country as a fuzzy set 
on a domain that consists of the ten attributes. Table 1 displays the mem- 
bership functions for each of the countries. Each row in this table represents 
a membership function mi{x) of the fuzzy set Ci of country i. Based on this 
information we compute the distance d{Ci,Cj) between every possible pair of 
countries Ci,Cj and obtain a distance matrix D — [c?ij], dij := dist (Ci,Cj). 
We use D as the newly transformed version of the original data (Table 1) and 
do data-clustering on it. The j*'' row of Z? is a feature vector representation of 
country i. We use the /c-means clustering procedure. 

Figure 4 shows the result of the fc-means clustering where the horizontal 
axis displays the cluster number and the vertical axis shows the distance of each 
point in a cluster to the mean of the cluster. In order to interpret this result, 
let us look at the fuzzy sets of some of the clusters. Figure 5 displays the fuzzy 
sets of cluster #1. As seen, the country Spain is considered similar to the rest 
of the countries in this cluster although its "happy" value is almost complement 
to the rest of the countries. This follows from the complement-property of our 
distance function (see section 4). 

Figure 6 displays the fuzzy sets of cluster #2. Ukraine seems to behave 
almost the opposite of Denmark (besides on the attributes EU_Gov where both 
have similar values). Ukraine and Turkey have interesting behaviors: they take 
very similar values for the attributes country-GOV up to Life and on UN while 
on the rest of the attributes they are almost mutually complement. Hence 

^The data set is the European Social Survey Round 4 Data (2008). Data file edition 3.0. 
Norwegian Social Science Data Services, Norway - Data Archive and distributor of ESS data, 
http:/ /ess. nsd.uib.no/. 



according to our distance they are considered close (which is why they are 
placed in the same cluster). 

Figure 7 shows the fuzzy sets of several countries in custer ^i. Hungary 
and the Russian Federation take very similar values and hence are close. Israel 
versus Hungary or versus Russian Federation has a similar behavior on the 
attributes country_GQV, EU_GOV, happy, National_GQV, politicians, UN, while 
on Health, Immigration, Life it has almost the complement values. Hence, 
overall, our distance function renders Israel as close to Hungary and Russia. 

We also ran a clustering procedure which is a variant of the Kohonen Self 
Organizing Map. The results that we obtained are very similar to those obtained 
by the /c-means procedure. 

6 Conclusion 

This paper introduces a new distance function d\st{A,B) for fuzzy sets A,B 
based on their descriptive complexity. The distance is shown to be a semi- 
metric that satisfies the triangle inequality. In comparison to other existing 
distance-functions for fuzzy sets this new metric is proportional to the additional 
amount of information needed to describe fuzzy set A when knowing fuzzy set 
B or vice versa. It thus has a natural information-based interpretation. Doing 
pattern clustering based on this distance we have shown that fuzzy sets that are 
clustered together tend to be more mutually informative. This is an interesting 
new property that can be useful for analyzing other data sets. 
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.81 


0, 


,50 


0, 


.39 


0, 


.34 


0, 


,72 


0, 


.58 


0, 


,38 


0.16 


Romania 


0, 


.45 


0, 


.75 


1, 


.00 


0. 


,67 


0, 


.45 


0, 


.44 


0. 


,77 


0, 


.66 


0, 


,30 


0.08 


Russian Fed 





.49 


0, 


.79 


0, 


.95 


0. 


,29 


0, 


.49 


0, 


.29 


0. 


,92 


0, 


.19 


0. 


,22 


0.08 


Sweden 


0, 


.84 


0, 


.70 


0, 


.64 


0, 


,90 


0, 


.84 


0, 


.85 


0, 


,75 


0, 


.73 


0, 


,73 


0.47 


Slovenia 


0, 


.57 


0, 


.74 


0, 


.92 


0, 


,51 


0, 


.57 


0, 


.64 


0, 


,90 


0, 


.33 


0, 


,48 


0.24 


Slovakia 


0, 


.52 


0, 


.69 


0, 


.95 


0, 


,61 


0, 


.52 


0, 


.52 


0, 


,84 


0, 


.30 


0, 


,38 


0.19 


Turkey 


0, 


.87 


1, 


.00 


0, 


.99 


0. 


,00 


0, 


.87 


0, 


.33 


1, 


,00 


0, 


.05 


0, 


,60 


0.00 


Ukraine 


0, 


.00 


0, 


.38 


0, 


.94 


0. 


,08 


0, 


.00 


0, 


.00 


0. 


,00 


0, 


.31 


0. 


,00 


0.05 



Table 1: Fuzzy membership values 
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Figure 1: Fuzzy sets A,B,C and A'^ 
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Figure 2: Fuzzy sets A,B,C and A'^ 
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Figure 3: Fuzzy sets A,B,C and A'^ 
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Figure 4: The result of A;-means clustering of countries based on the distance matrix M. The 
horizontal axis displays the cluster number (there are five clusters). The vertical axis shows the 
distance of each point in a cluster to the mean of the cluster. 
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Figure 5: Fuzzy sets representation of the countries in Cluster ^1. 
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Figure 6: Fuzzy sets representation of the countries in Cluster ^2 




Figure 7: Fuzzy sets representation of some of the countries in Cluster #3. 



